The dataset our group used is from PetFinder, an Malaysia online community for pet rescuers and adopters to help stray pets find their future home. Our objective is finding certain patterns or trends of those stray pets in Malaysia by analysis and visualization.
*PetID - Unique hash ID of pet profile
*AdoptionSpeed - Categorical speed of adoption. Lower is faster. This is the value to predict. See below section for more info. 0 - Pet was adopted on the same day as it was listed. 1 - Pet was adopted between 1 and 7 days (1st week) after being listed. 2 - Pet was adopted between 8 and 30 days (1st month) after being listed. 3 - Pet was adopted between 31 and 90 days (2nd & 3rd month) after being listed. 4 - No adoption after 100 days of being listed. (There are no pets in this dataset that waited between 90 and 100 days).
*Type - Type of animal (1 = Dog, 2 = Cat)
*Name - Name of pet (Empty if not named)
*Age - Age of pet when listed, in months
*Breed1 - Primary breed of pet
*Breed2 - Secondary breed of pet, if pet is of mixed breed
*Gender - Gender of pet (1 = Male, 2 = Female)
*Color1 - Main Color of pet
*Color2 - Second Color of pet
*Maturity Size - Size at maturity (1 = Small, 2 = Medium, 3 = Large, 4 = Extra Large, 0 = Not Specified)
*Fur Length - 1 = Short, 2 = Medium, 3 = Long, 0 = Not Specified
*Vaccinated - Pet has been vaccinated (1 = Yes, 2 = No, 3 = Not Sure)
*Dewormed - Pet has been dewormed (1 = Yes, 2 = No, 3 = Not Sure)
*Sterilized - Pet has been spayed / neutered (1 = Yes, 2 = No, 3 = Not Sure)
*Health - Health Condition (1 = Healthy, 2 = Minor Injury, 3 = Serious Injury, 0 = Not Specified)
*Quantity - Number of pets represented in profile
*Fee - Adoption fee (0 = Free)
*State - State location in Malaysia
*RescuerID - Unique hash ID of rescuer
## Warning in data(num_variable): data set 'num_variable' not found
## Warning in randomForest.default(m, y, ...): The response has five or fewer
## unique values. Are you sure you want to do regression?
This feature importance plot showing which feature is the most important for predicting the Adoption speed. We first tunned the model use random forest model. The result showing Age and photo amount is the most important factor for predicting the Adoption speed. then will be breed and color. On the contrary, Video amount and Type seems not that important for the speed of adoption.
From this correlation matrics we can see that most features are not correlated which is good. However, Sterilized and Vaccinated, Gender and Quantity, Vaccinated and Dewormed are around 0.7 correlated. This is make sense because Sterlizedm Vaccinated and Dewormed are a serious of health check, so most of them are done together.
Few pets can be adopted in the same day after their information posted. There are a large amount of pets who cannot be adopted after 100 days. Half of stray pets can be adopted within one month.
From the graph, we notice that the total number of stray dog is larger than the total number of stray cat. In addition, the total number of stray female pet is the largest in both cat and dog group. The number of stray male pets is almost the same for cat and dog. The mixed gender represents a group of pets in one adoption.
Within a month, the total number of adopted cats and dogs increases as the days increasing. However, there is a decrease in “3”(between 31 and 90 days) in both cat and dog groups, which means that the number of adopted pet decreases after 30 days. If the pets cannot be adopted after 90 days, they may not be adopted in the future. Within the week when the pets information is posted, the total number of adopted cats is larger than the total number of adopted dogs. However, after that week, the number of adopted cats is smaller than the number of adopted dogs. Also, for those pets who cannot find their home, dogs’ number is larger than cats’.
From this chart, the number of female pets is larger than the number of male pets in all adoption speed, which is consistent to the conclusion we mentioned in the “Type vs Gender”. Futhermore, there is a decrease in “3”(between 31 and 90 days) in both female and male groups。
In this chart, we treat adoption speed as numerical variable to find out the relationship between total number of homeless pets and the adoption speed for different state. we found that the more stray pets in a state, the lower adoption speed it has. Because, in this plot, the two largest points have a relative low average adoption speed. In addition, we conclude that “Selangor” has largest number of stray pets.
Heatmap: Total Number of Homeless Pets in Different States.
## Warning: package 'raster' was built under R version 3.5.2
## Warning: package 'ggthemes' was built under R version 3.5.2
## Warning in gpclibPermit(): support for gpclib will be withdrawn from
## maptools at the next major release
## [1] TRUE
The most light blue is the “Selangor” state which has highest number of homeless pets. The number of stray pets in the left island is larger than that of right island. The grey area means that we do not have enough data in that state.
For those stray pets, most of them are younger than 5 years old and half of them has either minor injory or serious injury. There are large number of heathy homeless pets before 3 year olds in both cat and dog groups. Before 5 years olds, the number of having minor and serious injury cats are larger than the number of dogs those also have minor and serious injury.
There are large number of stray pets who have not been either vaccinated or dewormed. As the healthy condition becomes worse, the number of those pets who have been both vaccinated or dewormed decreases and the number of stray pets who are “not sure” for both vaccinated and dewormed condition increases.
The plot shows that most pets (both cats and dogs) have 2 colors. Single color and triple color have the similar number of pets.
For most stray pets have been adopted, people did not need to pay fee for adopting. The variance of fee for adopting cat are the same in different adoption speed. However, the fee’s variance for adopting dogs is larger in the adoption speed 1 and 3. It means that the range of the fee is larger when the dogs were adopted within a week or after a month.
For cats, the shorter fur length of a cat may indicate that this cat has a pure breed. However, for dogs, either long or short fur of a dog may indicate that this dog is a pure breed. The total amount of pure breed pets is larger than the total amount of mixed breed pets
The larger number of pets represented in profile, the less fee may need for adoption. Also, most stray pets are free adopting. For most cases, there’s only one pet for each adoption and the fee is less than 500.
Adoption speed with Age and PhotoAmt are related. The purple which is the slowest adoption speed are mostly appears in the less photo amount. This is make sense because if you post more photos, people will more likely to adopt this pet in a very short time. The result of Age is unexpected uncorrelated with the adoption speed. we dont see a very clear pattern for Age and Photo amount.
We can see from this plot, the Adoption speed are somehow correlatted to Fee and Quantity. If the quantity of the pet are higher, the adoption speed will be faster. This is oppisite for the Fee, if the fee is higher, the adoption speed is lower.
We see that meduim size pet are more likely to be adopted not only as a dog or cat. Also the large number of medium size are adopted between 8-100days.
## Warning: package 'SnowballC' was built under R version 3.5.2
## $content
## [1] "Nibble"
## Warning in tm_map.SimpleCorpus(corpus, tolower): transformation drops
## documents
## Warning in tm_map.SimpleCorpus(corpus, removeWords, stopwords("english")):
## transformation drops documents
## Warning in tm_map.SimpleCorpus(corpus, removeWords, c("and", "puppies", :
## transformation drops documents
## Warning in tm_map.SimpleCorpus(corpus, removePunctuation): transformation
## drops documents
We generate this report to give people a overview of what is the adoption market in Malaysia.The Adoption speed of pets are depending on various of features. Some of thoese variables are binded together. For examples Vaccinated, Dewormed and Sterilized are affecting the adoption speed together. Unexpectly, fee is a not a very important feature that affect the adoption speed. People seem dont care the price that much when they arelly want to adopt a pet. Besides of that, we find an interesting phenomenon that people are more likely to adopt dog than cat, and more likely to adopt female pet than male.
Because there are no pets in this dataset that waited between 90 and 100 days, we need to find more comprehensive data.
PetFinder website also has descriptions and pictures for each pet. Since the word in description and features of images are related to the adoption speed, we need to use those data to find more relationships.